import pandas as pd
import numpy as np
airbnb = pd.read_csv('airbnb.csv') # For Seattle only
As with all new datasets, let's start by familiarizing ourselves with the dataset:
Go ahead and check it out for yourself-
print(airbnb.shape)
print(airbnb.columns.values)
airbnb.sample()
Our imports- Note that we'll rename plotly express as px.
px is a fantastic "wrapper" for the base plotly package. What that means is we can use incredibly easy and readable functions, and plotly express will do the hard work of convering that input into formats that the software can understand.
Quick aside: If you're a web developer and love JS, or a academic and use R, the same Plotly API is available to use in both languages.
import plotly
import plotly.express as px
Let's start off with a simple scatter plot, which we can whip up with px.scatter()
What does the association between price and availability look like?
airbnb_sample = airbnb.tail(1500)
fig = px.scatter(airbnb_sample, x='availability_365', y='price')
fig.show('notebook')
It works, but doesn't really tell us too much. Let's modify the plot by adding parameters to px.scatter()
With any python package, we can pull up some quick documentation from Jupyter itself using ?
Try it out: What parameters does px.scatter accept?
px.scatter?
fig = px.scatter(airbnb_sample, x='availability_365', y='price',
opacity=0.3, marginal_y = 'histogram',
color='room_type',
)
fig.show('notebook')
So we're still not seeing much of a clear trend here, bummer.
There are, however, quite a few outliers in the price. Let's see if we can adjust our graph so the rest of the data isnt squished down.
fig = px.scatter(airbnb_sample, x='availability_365', y='price',
opacity=0.3, log_y = 'True',
color='room_type',
)
fig.show('notebook')
fig = px.scatter(airbnb_sample, x='availability_365', y='price',
opacity=0.3, range_y = (1,1050),
marginal_y = 'histogram', marginal_x = 'histogram',
color='room_type',
)
fig.show('notebook')
Peep the histogram on the right, that shows a pretty neat trend with the room types. We can check that out in more depth later.
Those outliers were causing us a bit of trouble, but wasn't too hard to deal with.
But that does make me a bit curious: What was so special about those listings?
The power of plotly is that we can use the interactvity to literally just hover over the data points to see what's going on.
All we have to do is suggest what features to display:
See if you can find out which parameters can be used to show text on hover:
fig = px.scatter(airbnb_sample, x='availability_365', y='price',
opacity=0.3, color='room_type', log_y=True,
hover_name='name', hover_data=['neighbourhood_group', 'number_of_reviews']
)
fig.show('notebook')
Plotly is interactive! Play around with the legends and plot area
Double click on the legend icon on the right, and plotly will automatically update the figure to select those points only. If we want
We can change our colors fairly easily using color scales.
If the feature we pass to color= is discrete or categorical, we'll add the color_discrete_sequence param
If the feature is instead continuous, we'll use the color_continuous_scale param instead
Open the docs, and try out your favorite below:
fig = px.scatter(airbnb_sample, x='availability_365', y='price',
opacity=0.3, color='room_type', log_y=True,
hover_name='name', hover_data=['neighbourhood_group', 'number_of_reviews'],
color_discrete_sequence=plotly.colors.qualitative.Prism,
)
fig.show('notebook')
Under the hood, we can see that each of these sequences are just lists of colors, so we could subset them to use different values
plotly.colors.qualitative.Prism[2:6]
To finish off, we can add titles, labels and such pretty easily.
See if you can use the function documentation or google to figure out how to do that:
fig = px.scatter(airbnb_sample, x='availability_365', y='price',
opacity=0.3, color='room_type', log_y=True,
hover_name='name', hover_data=['neighbourhood_group', 'number_of_reviews'],
color_discrete_sequence=plotly.colors.sequential.BuPu_r[3:],
title='Seattle Airbnb Prices vs. Demand, Broken Down by Room Type',
labels={'availability_365':'Days Available Per Year',
'price':'Nightly Rate ($)',
'room_type':'Type of Room'},
)
fig.show('notebook')
We can check out other basic features of the dataset by construction quick bar plots and histograms.
In breakout groups, see if you can:
Make these complete! Label axes, hover text, other columns of data if you can.
Hint: You may need to use GroupBy functions from Week 2 to aggregate data more comfortably
airbnb_byN = airbnb.groupby(by=['neighbourhood','neighbourhood_group']).agg('median').reset_index()
airbnb_byN = airbnb_byN[['neighbourhood','neighbourhood_group',
'price', 'minimum_nights','number_of_reviews','availability_365']]
airbnb_byN
#1
fig = px.bar(airbnb_byN,
x='neighbourhood', y='price',
log_y=False,
color='neighbourhood_group',
hover_name='neighbourhood_group',
color_discrete_sequence=plotly.colors.qualitative.Prism,
title='Seattle Airbnb Prices across Neighborhoods',
labels={'availability_365':'Days Available Per Year',
'price':'Nightly Rate ($)',
'neighbourhood':'Neighborhood',
'neighbourhood_group':'Region'
},
)
fig.show('notebook')
# 2
Aside from scatter and bar plots, there's quite a lot else we can make.
Check it out here: https://plotly.com/python/
As you'll see, some of the documentation uses graph_objects as go, or figure_factory as ff. What does that mean?
Plotly was originally written using a base API, in which each "layer" was constructed individually.
Before, you'd have to say write out what type of object was being written, the exact data to be passed as a list, etc.
Plotly express "abstracted" all this away. It created quick functions that generate the necessary layers in the backend, saving us quite a bit of time.
We can still use some of these features to add more customization to our graph. For example, with our previous barplot:
airbnb_byG = airbnb.groupby('neighbourhood_group').agg('median').reset_index()
airbnb_byG.head()
fig = px.bar(airbnb_byG,
x='neighbourhood_group', y='price',
log_y=False,
color='neighbourhood_group',
hover_name='neighbourhood_group',
color_discrete_sequence=plotly.colors.qualitative.Prism,
title='Seattle Airbnb Prices across Neighborhoods',
labels={'availability_365':'Days Available Per Year',
'price':'Nightly Rate ($)',
'neighbourhood_group':'Region'},
text='price',
)
fig.update_traces(texttemplate='$'+'%{text:.2s}', textposition='outside')
fig.show('notebook')
# Add Figure Annotiations w/ additional layers